Sequence Comparison and Stochastic Model Based on Multi-order Markov Models

نویسندگان

  • Xiang Fang
  • Guoqing Lu
  • Walter W. Stroup
  • Shunpu Zhang
چکیده

This dissertation presents two statistical methodologies developed on multi-order Markov models. First, we introduce an alignment-free sequence comparison method, which represents a sequence using a multi-order transition matrix (MTM). The MTM contains information of multi-order dependencies and provides a comprehensive representation of the heterogeneous composition within a sequence. Based on the MTM, a distance measure is developed for pair-wise comparison of sequences. The new method is compared with the traditional maximum likelihood (ML) method, the complete composition vector (CCV) method and the improved version of the complete composition vector (ICCV) method using simulated sequences. We further illustrate the application of the MTM method using two real data sets, influenza A virus hemagglutinin gene sequence and complete mitochondrial genome sequences. We then present a stochastic model named Multi-Order Markov Model under Hidden States (MMMHS) for representing heterogeneous sequences. MMMHS is similar to the conventional Hidden Markov Model (HMM) and Double Chain Markov Model (DCMM) in terms of using hidden states to describe the non-homogeneity of a sequence, but it provides a more flexible dependency structure by changing the order of Markov dependency under different hidden states. We extend the forward-backward procedure to MMMHS and provide the complete model estimation procedure based on Expectation-Maximization (EM) algorithm. The method is then illustrated with applications on several real data sets, and the results are compared with that of traditional methods. supervisory committee. A special thank you goes out to my family and friends, especially my parents and wife, for their love and support. Figure 2.1 Demonstration of (a) DCMM and (b) HMM. 25 Figure 3.1 The pre-defined relationship used to generate simulated sequences. 36 Figure 3.2. Average scores of a transition probability as a function of the order of the MTM experimented using (a) simulated dataset, (b) influenza A viral hemagglutinin gene sequences, and (c) the compete mitochondrial genome sequences of eutherian and non-eutherian organisms. 37 Figure 3.3 Consensus trees obtained from the 1000 simulated sequences using (a) Maximum likelihood, (b1)-(b3) CCV method with selected strings of size 500, 2500 and 5000, (b4) CCV method, (c1)-(c8) MTM method with K including the selected 1, 3, 5, 6, 7, 8, 9, and 11 orders with highest average score, and (c9) MTM method with

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Generation Expansion Planning Considering Load Growth Uncertainty: A Novel Multi-Period Stochastic Model

Abstract – Distributed generation (DG) technology is known as an efficient solution for applying in distribution system planning (DSP) problems. Load growth uncertainty associated with distribution network is a significant source of uncertainty which highly affects optimal management of DGs. In order to handle this problem, a novel model is proposed in this paper based on DG solution, consideri...

متن کامل

On Calibration and Application of Logit-Based Stochastic Traffic Assignment Models

There is a growing recognition that discrete choice models are capable of providing a more realistic picture of route choice behavior. In particular, influential factors other than travel time that are found to affect the choice of route trigger the application of random utility models in the route choice literature. This paper focuses on path-based, logit-type stochastic route choice models, i...

متن کامل

IMAGE SEGMENTATION USING GAUSSIAN MIXTURE MODEL

  Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we have learned Gaussian mixture model to the pixels of an image. The parameters of the model have estimated by EM-algorithm.   In addition pixel labeling corresponded to each pixel of true image is made by Bayes rule. In fact, ...

متن کامل

MHIDCA: Multi Level Hybrid Intrusion Detection and Continuous Authentication for MANET Security

Mobile ad-hoc networks have attracted a great deal of attentions over the past few years. Considering their applications, the security issue has a great significance in them. Security scheme utilization that includes prevention and detection has the worth of consideration. In this paper, a method is presented that includes a multi-level security scheme to identify intrusion by sensors and authe...

متن کامل

­­Image Segmentation using Gaussian Mixture Model

Abstract: Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we used Gaussian mixture model to the pixels of an image. The parameters of the model were estimated by EM-algorithm.   In addition pixel labeling corresponded to each pixel of true image was made by Bayes rule. In fact,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016